Modeling Connectivity in Terms of Network Activity 



o 
o 



oo 

(N 



43 ' 

i 

O ' 

o : 

O 1 

• i-H ' 

C/3 ■ 
>V 

43 : 

Oh. 



> 
o 

o 

On 
O 



Lucas Antiqueira 1 , Francisco Aparecido Rodrigues 1 , and Luciano da Fontoura Costa 1 ' 2, 
1 Institute of Physics at Sao Carlos, University of Sao Paulo, Sao Carlos, SP, Brazil 
2 National Institute of Science and Technology for Complex Systems, Brazil 

A new complex network model is proposed which is founded on growth with new connections 
being established proportionally to the current dynamical activity of each node, which can be 
understood as a generalization of the Barabasi- Albert static model. By using several topological 
measurements, as well as optimal multivariate methods (canonical analysis and maximum likelihood 
decision) , we show that this new model provides, among several other theoretical types of networks 
including Watts-Strogatz small-world networks, the greatest compatibility with three real-world 
cortical networks. 



Several models have been developed in order to bet- 
ter understand the structure and evolution of complex 
networks, including Erdos and Renyi random graph 
model (ER) [l[ and Watts and Strogatz small-world 
model (WS) [2||. Another widely known approach is 
the Barabasi-Albert (BA) model, developed as a means 
to reproduce the scale-free feature observed in many 
real- world networks, such as in the World Wide Web 
(WWW), the power grid and the actor collaboration net- 
work Scale- free networks present a power-law distri- 
bution of degrees in the form P(k) ~ fc~ 7 , where k is the 
degree and the exponent 7 is system dependent. Such 
networks are characterized by the existence of hubs, i.e. 
nodes with particularly high degrees. The BA model is 
based on growth and preferential attachment: starting 
with a small network of uq nodes, new nodes are sequen- 
tially added and connected to m other nodes with prob- 
ability H(i) = hi/ J2j kj {hi is the degree of node i), i.e. 
nodes with higher degree have a proportionally higher 
probability of receiving new connections [3(. 

Other models of network formation have been pro- 
posed as modifications or generalizations of the BA 
model, e.g. non-linear preferential attachment rules 
Nevertheless, these models tend to be solely based on the 
structural features of the growing network, such as the 
degree. In the work reported here, we have developed a 
preferential attachment model based on a dynamical fea- 
ture of network nodes: namely, nodes with higher activity 
have higher probability of establishing new connections. 
We use the term "activity" to refer to the stationary dis- 
tribution 7? of frequency of visits to nodes of a random 
walk, quantified as 7Tj = limj_xx, Vi/t, where Vi is the 
number of times the random walker visited node i af- 
ter t walking steps. Therefore, the attachment rule of 
our model is based on a dynamical process taking place 
on the entire network rather than on static, topological 
feature of nodes. 

It is worth pointing out that the Activity-based Pref- 
erential Attachment model (APA for short) introduced 
here takes into account the more general case of directed 
networks, as opposed to the undirected ones considered 
in the BA model, so that in- and out-going edges are 
assigned for each newly created node. It turns out that 



if undirected edges are created in the APA model, the 
BA model is reproduced, as a consequence of the fact 
that the activity is perfectly correlated with the degree 
in undirected networks [||. Therefore, the APA model 
can be understood as a generalization of the BA model. 
However, as in directed networks the activity is not in 
general correlated with the degree, the type of topologies 
produced by the APA model depends strongly on the spe- 
cific distributions of frequencies of visits. As a matter of 
fact, the correlations between in-degree and activity and 
between out-degree and activity vary greatly from net- 
work to network, and no exact and general relationship 
between these quantities has been discovered yet, except 
that full correlation between the frequency of visits and 
the degree is obtained provided the in-degree is identical 
to the out-degree for every node in the network H (ob- 
serve that the undirected networks are only a particular 
case of this more general condition). 

Some models based on dynamical features have also 
been reported in the literature. Examples include mod- 
els driven by Prisoner's Dilemma dynamics Q and de- 
gree of synchronizability [7]. Another approach to the 
development of connectivity is Hebbian theory [8J , where 
two neurons are connected if a neuron repeatedly or per- 
sistently takes part in firing the other neuron. Such a 
model has been considered in the study of complex net- 
works but we are not aware of related growing models. 
Indeed, no model based on preferential attachment tak- 
ing into account individual node activity seems to have 
been proposed yet. Our main motivation to develop such 
model is that in many networks the dynamics is more 
closely related to the relevance of each node than the de- 
gree or other traditional structural node measurement. 
For instance, the frequency of visits to each web page 
in the WWW is a particularly efficient indicator of page 
relevance (viz. Google's PageRank 10]). Therefore, a 
measurement of the dynamics taking place in a network 
constitute a more direct and reliable indicator of node 
relevance than just in- or out-degrees, especially for di- 
rected networks, where the activity tends to be uncor- 
related with the degree. We have chosen the traditional 
random walk, i.e. the probability of following a link is 
inversely proportional to the out-degree of the current 
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node, because it is one of the most important models of 
dynamics in physics and in many other fields . 

In the present work, we show that several directed net- 
works intrinsically underlying dynamical processes, such 
as the cortical networks of the cat and the macaque 
[l3j , are best reproduced by the APA model than by other 
classical complex network models (e.g. WS and BA). Our 
methodology is based on the characterization of each net- 
work in terms of a set of network measurements and on 
the subsequent application of canonical variable analysis 
and Bayesian decision theory 14 , 15| . 

A directed network of N nodes can be completely rep- 
resented by the adjacency matrix A of order NxN, whose 
position A(j, i) is equal to 1 if and only there exists a di- 
rected edge pointing from node i to node j (otherwise, 
= 0). The in-degree of node i, i.e. the number 
of connections it receives, is equal to fc™ = ^2-A(i,j), 
and its out-degree, which is equal to the total number 
of edges leaving it, is given as k° ut = [l3 |. 
The dynamics of a random walk is entirely determined 
by the stochastic (or transition) matrix P with elements 
P(j,i) = l/k° ut , i.e. the probability of the walker visit- 
ing node j at step t + 1 after being at node i at step 
t is equal to P(j,i). Notice that A(j, i) = implies 
P(j,i) = 0. In other words, the next step of the walker 
depends only on its current state (Markov Chain). The 
stationary, or steady-state, distribution of probabilities 
of being at each node, i.e. ff, can be obtained by solving 
Pit = jr. In particular, 7? is the eigenvector associated 
with the eigenvalue 1 of P [llj. We define Yli^i = 1 
for proper statistical normalization. For this distribution 
to be unique, P needs to be irreducible, i.e. the net- 
work must be strongly connected, which happens when 
every node can reach every other node in the network 
through a finite path. For undirected networks (when 
A is a symmetric matrix), the stationary distribution of 
a random walk can be directly obtained from the de- 
gree distribution as follows 7Tj = ki j J2j kj > whereas for 
directed networks this perfect correlation only happens 
when, for every node i, k™ 1 = k° ut . 

The growth of the APA model starts at T = with a 
random directed network of hq nodes and initial walker 
distribution tt°. At each subsequent time step T (T < 
N — hq) a new node i is added to the network and 
directed edges are independently created from m older 
nodes (m <C uq) to the current node i (in-edges) and 
from node i to m other nodes (out-edges). The general 
idea behind this model is that a new node would want 
to establish in-edges with highly active nodes in order to 
receive a considerable share of activity from the outset of 
its lifetime. Therefore, in-edges are created between node 
i and other nodes following the preferential attachment 
probability H(i) — . Because there is no out-edge 
attachment rule that could intuitively increase the activ- 
ity of a new node, we have proposed two approaches: a 
uniform rule Tl(i) — l/N, and the preferential rule al- 



ready used for in-edges. Therefore, these two approaches 
have divided APA into two model variations: the original 
APA, that considers the preferential attachment for both 
edge directions, and APA', that takes into account only 
the in-edges in the preferential attachment rule. 

Figure Q] illustrates a small network being constructed 
by the APA approach with m = 2, where a new node 
tends to be connected with highly active nodes regarding 
both edge directions. The stationary frequencies of visits 
of each node are shown by gray-levels. Notice that for 
both APA and APA' networks, the average in- and out- 
degrees are (k in ) = (k out ) = 2m. The BA model is 
exactly reproduced by considering the APA model with 
undirected edges, since II(i) = nf^ 1 = h/J^j^j f° r 
every undirected and connected network. Consequently, 
the APA model can be understood as a generalization of 
the BA model. 




FIG. 1: A step T along the growth of a network by using the 
APA model. Ten nodes already exist in the network, whose 
activities tt'[~ 1 are depicted by gray levels, and a new node 
tends to make connections with highly active nodes. 

Computer simulations were performed in order to char- 
acterize APA and APA' models, where 150 realizations 
of each model with N = 2500 and (k in ) = (k out ) = 6 
were generated, and the respective average results are 
shown in Figure [5] and Table [J The in- and out-degree 
distributions are included in Figure [2 as well as the ac- 
tivity distribution. The APA model yields power-law 
degree distributions for both edge directions, although 
these distributions have a small deviation from a power- 
law when considering high degrees. Nevertheless, the ob- 
tained distributions clearly show the existence of hubs 
of in- and out-degree in the APA model. When con- 
sidering APA' networks, these distributions greatly devi- 
ate from a power-law, with steep decays for high degrees 
that prevent the occurrence of nodes as highly connected 
as in the APA networks. The activity distributions re- 
veal that APA networks are also scale-free with this re- 
spect, whereas APA' again deviates from a power-law, 
not showing hubs of activity like in the APA model. Net- 
works with different sizes and average degrees were also 
analyzed, yielding similar results. 
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FIG. 2: Average (a) in-degree, (b) out-degree and (c) activity distributions of models APA and APA'. Standard deviations are 
too small for visualization. 



Table U shows the values obtained for a set of measure- 
ments calculated for APA and APA' (each model with 
150 realizations, N = 2500 and (k m ) = (k out ) = 6). The 
employed measurements are 14, 1(| (i) clustering coeffi- 
cient C, (ii) length of shortest paths £, (iii) assortativity 
between in- and out-degrees (r™ and r°°, respectively) 
and (iv) reciprocity p, where averages were taken for each 
network model. The first three measurements are well- 
known [3], while the reciprocity quantifies to what an 
extent a directed network contains symmetric links, i.e. 
edges that connect pairs of nodes at both directions . 
We also computed the minimum reciprocity p m in, which 
depends on the specific edge density of a network. 



TABLE I: Average and standard deviations of the cluster- 
ing coefficient (C), length of shortest paths (£), in- and out- 
assortativities (r" and r°°) and reciprocity (p and pmin), for 
models APA and APA'. 





APA model 


APA' model 


c 


0.0306 ± 0.0023 


0.0090 ± 0.0004 


I 


3.7453 ± 0.0240 


4.2433 ± 0.0075 


r ii 


-0.0584 ± 0.0063 


0.1512 ±0.0095 


r°° 


-0.0500 ± 0.0071 


0.2094 ± 0.0107 


P 


0.0123 ±0.0015 


0.0063 ± 0.0009 


pmin 


-0.0024 ± 10" 6 


-0.0024 ± 10 -6 



The results included in Table Q] show that both APA 
and APA' networks have particularly low clustering co- 
efficients (specially in APA') coexisting with short path 
lengths (around 4), therefore resembling the BA model. 
The assortativity values show no correlation between de- 
grees in the APA model (like in the BA model), whereas 
slightly positive correlations appear in APA'. Reciprocity 
results indicate that APA and APA' networks are arecip- 
rocal (p«0), which means that edges in these networks 
do not tend to be symmetric (reciprocity results are also 
particularly close to the respective minimums). Simi- 
lar characteristics were observed in networks of different 
sizes and densities (results not shown). 



In order to accurately compare the models APA and 
APA' with other models, and also to classify real-world 
networks with reference to a set of putative models, it is 
necessary to consider a larger number of network mea- 
In the current letter, we adopted a set of 
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surements 

nine measurements to quantify the topological properties 
of complex networks: (i) average node degree, (ii) average 
clustering coefficient, (iii) average shortest path length, 
(iv) in- and (v) out-assortativity coefficients, (vi) cen- 
tral point dominance, (vii) average betweenness central- 
ity, (viii) hierarchical clustering coefficient and (ix) hier- 
archical con verg ence ratio. All these measurements are 
discussed in [141 ] . 

The classification of real-world networks involves the 
consideration of different network models, each one gen- 
erating specific types of topologies. In this way, we 
took into account the following models (as well as 
APA and APA' models): (i) Erdos-Renyi random graph 
(ER) [l|, which generates networks with random place- 
ment of connections, (ii) small- world model of Watts and 
Strogatz (WS), which produces networks whose struc- 
tures lie between a regular and a random network 0], 
(iii) Barabasi- Albert scale- free model (BA), which con- 
structs networks having a power-law degree distribu- 
tion and (iv) a geographical model (GG), where nodes 
next to each other in a given metric space are more likely 
to be connected by an edge [14 1 . 

Since the network classification requires a large set of 
model realizations in order to minimize statistical fluctu- 
ations, we applied canonical variable analysis to reduce 
the dimensionality of the M-dimensional measurement 
space while maximizing the separation between the net- 
work models. This multivariate statistical technique is an 
extension of principal component analysis and allows op- 
timal projections of a set of measurements so as to obtain 
the reduction of the dimensionality of the original space 
while maximizing the separation between the known cat- 
egories (i.e. network models) [13] • The computation of 
the canonical variable analysis is based on the so-called 
inter- and intra-class matrices, as well as on the diagonal- 
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ization of the product between the inverse of the intra- 
class matrix and the inter-class matrix. The selection of 
the d eigenvectors corresponding to the highest absolute 
eigenvalues of this matrix product allows the projection 
of the measurements into a d-dimensional space {d < M). 
In the current work, we considered d = 2. 

The projection is then performed by calculating the in- 
ner products between the original feature (network mea- 
surement) vectors and the two eigenvectors correspond- 
ing to the highest eigenvalues. After the projection, wc 
estimated the probability density of the projected points 
into the two dimensional space for each class, by con- 
sidering the non-parametric method called Parzen win- 
dows [151 ] . This method starts by representing each point 
in the projection as a Dirac's delta function. These deltas 
are then convolved with a normalized Gaussian function, 
yielding the estimated distributions, which are then con- 
sidered for the classification of a given real-world net- 
work. Notice that every model realization has the same 
number of vertices and approximately the same aver- 
age degree of the respective real-world network. Max- 
imum likelihood decision theory was then applied in or- 
der to classify each real-world network by associating it 
to the model that results in the largest overall probabil- 
ity [3, EH- Observe that equiprobability of the mass 
probabilities is guaranteed by using the same number of 
realizations for each category. 

The classification methodology was applied to three 
cortical networks, namely (i) cat cortex containing N — 
52 cortical areas [l2j], (ii) cat cortex including all corti- 
cal and thalamic areas (N = 95) [H and (iii) macaque 
large scale cortical network, including visual and senso- 
rimotor areas (N = 47) 13]; to two food webs, (i) Can- 
ton (a pasture grassland in New Zealand, N = 110) [Til ] 
and (ii) Kyeburn (a tussock grassland in New Zealand, 
N = 98) 1181 ; and to the Roget thesaurus network 
(N = 1022) [19J. For short, the first two networks are 
called here "cat52" and "cat95", respectively. Figure [3] 
presents the classification of all three cortical networks 
and the Roget thesaurus, taking into account 100 real- 
izations of each network model. All cortical networks 
fall inside the APA' region in the canonical projections, 
which means that the APA' model is the most likely in 
those cases. On the other hand, in the case of the Ro- 
get thesaurus network and the food webs, the respective 
networks have been classified as small-world, as could be 
expected fToL |2~ 



Therefore, the proposed activity-based preferential at- 
tachment model (APA') revealed to be compatible with 
cortical networks. Nevertheless, it has been claimed that 
brain networks can be well modeled by WS networks (e.g. 



dom walk activity. 

The obtained results suggest that the brain is orga- 
nized in order to favor the connections with a small num- 
ber of highly active regions. In fact, since this mechanism 
is a generalization of the well-established preferential at- 
tachment model, we observed that the connections in the 
brain would not be guided by structural aspects (i.e. the 
connectivity), but rather by dynamics (i.e. frequency of 
visits by random walks). This process can be related to 
the Hebbian theory 0, since nodes that receive walks 
more frequently tend to receive repeated and persistent 
stimulation and therefore establish a large number of con- 
nections. 

All in all, the current work described a new model of 
complex network which is founded on a growth mech- 
anism favoring attachments proportional to the current 
activity of each node. This approach can also be un- 
derstood as a generalization of the BA model. As such, 
the proposed approach is intrinsically suited for model- 
ing complex systems whose connectivity is determined 
by the respective dynamics. An effective combination of 
estimation of several topological measurements, as well 
as the optimal methods of canonical analysis and maxi- 
mum likelihood decision theory, paved the way to a sound 
comparison between real-world networks and several pu- 
tative theoretical models, namely APA, APA', ER, WS, 
BA and GG. The proposed model turned out to exhibit 
a remarkable compatibility with three real-world cortical 
networks. Future developments include the application 
to modeling neuronal networks where each node repre- 
sents a neuron. It would also be interesting to consider 
attachment rules founded on transient, rather than sta- 
tionary dynamics. 
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2lL |22|, [23;]), which is not supported by the results of 
the current work. Furthermore, the "cat95" network was 
previously analyzed using the stationary random walk 
distribution to simulate cortical activation 24J, and the 
out-degrees were found to be highly correlated with ran- 
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FIG. 3: Classification of real-world networks: (a) cat cortex including 52 cortical areas, (b) cat cortex containing all cortical 
and thalamic areas, (c) macaque cortical network, and (d) Roget thesaurus network. Each classified network is indicated by 
an arrow and a black circle. 
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